Native Language Identification on Text and Speech
نویسندگان
چکیده
This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken responses in form of audio transcriptions and iVectors by non-native English speakers of eleven native languages. Our system competed in the challenge under the team name ZCD and was based on an ensemble of SVM classifiers trained on character n-grams achieving 83.58% accuracy and ranking 3 in the shared task.
منابع مشابه
Speech-like Pragmatic Markers in Argumentative Essays Written by Iranian EFL Students and Native English Speaking Students
In this study, the use of speech-like pragmatic markers in Iranian EFL students’ academic writing was investigated. Speech-like pragmatic markers, such as I think, well, I guess, actually, anyway, anyhow, etc. are linguistic components that are more specific to conversation than writing, and writers may wrongly include them in their academic writing. To examine the students’ use of speech-like ...
متن کاملSpeech-like Pragmatic Markers in Argumentative Essays Written by Iranian EFL Students and Native English Speaking Students
In this study, the use of speech-like pragmatic markers in Iranian EFL students’ academic writing was investigated. Speech-like pragmatic markers, such as I think, well, I guess, actually, anyway, anyhow, etc. are linguistic components that are more specific to conversation than writing, and writers may wrongly include them in their academic writing. To examine the students’ use of speech-like ...
متن کاملCan characters reveal your native language? A language-independent approach to native language identification
A common approach in text mining tasks such as text categorization, authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. In this work, an approach that uses character n-grams as features is proposed for the task of native language identification. Instead of doing standard feature selection,...
متن کاملPerceptual confusions of American-English vowels and consonants by native Arabic bilinguals.
This study investigated the perception of American-English (AE) vowels and consonants by young adults who were either (a) early Arabic-English bilinguals whose native language was Arabic or (b) native speakers of the English dialects spoken in the United Arab Emirates (UAE), where both groups were studying. In a closed-set format, participants were asked to identify 12 AE vowels presented in /h...
متن کاملExploring Optimal Voting in Native Language Identification
We describe the submissions entered by the National Research Council Canada in the Native Language Identification Shared Task 2017. We mainly explored the use of voting, and various ways to optimize the choice and number of voting systems. We also explored the use of features that rely on no linguistic preprocessing. Long ngrams of characters obtained from raw text turned out to yield the best ...
متن کاملImproving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech
Identifying a speaker’s native language with his speech in a second language is useful for many human-machine voice interface applications. In this paper, we use a sub-phone-based i-vector approach to identify non-native English speakers’ native languages by their English speech input. Time delay deep neural networks (TDNN) are trained on LVCSR corpora for improving the alignment of speech utte...
متن کامل